Goto

Collaborating Authors

 cc-by 4



LAD-BNet: Lag-Aware Dual-Branch Networks for Real-Time Energy Forecasting on Edge Devices

Lignier, Jean-Philippe

arXiv.org Machine Learning

Real-time energy forecasting on edge devices represents a major challenge for smart grid optimization and intelligent buildings. We present LAD-BNet (Lag-Aware Dual-Branch Network), an innovative neural architecture optimized for edge inference with Google Coral TPU. Our hybrid approach combines a branch dedicated to explicit exploitation of temporal lags with a Temporal Convolutional Network (TCN) featuring dilated convolutions, enabling simultaneous capture of short and long-term dependencies. Tested on real energy consumption data with 10-minute temporal resolution, LAD-BNet achieves 14.49% MAPE at 1-hour horizon with only 18ms inference time on Edge TPU, representing an 8-12 x acceleration compared to CPU. The multi-scale architecture enables predictions up to 12 hours with controlled performance degradation. Our model demonstrates a 2.39% improvement over LSTM baselines and 3.04% over pure TCN architectures, while maintaining a 180MB memory footprint suitable for embedded device constraints. These results pave the way for industrial applications in real-time energy optimization, demand management, and operational planning.


CHURRO: Making History Readable with an Open-Weight Large Vision-Language Model for High-Accuracy, Low-Cost Historical Text Recognition

Semnani, Sina J., Zhang, Han, He, Xinyan, Tekgürler, Merve, Lam, Monica S.

arXiv.org Artificial Intelligence

Accurate text recognition for historical documents can greatly advance the study and preservation of cultural heritage. Existing vision-language models (VLMs), however, are designed for modern, standardized texts and are not equipped to read the diverse languages and scripts, irregular layouts, and frequent degradation found in historical materials. This paper presents CHURRO, a 3B-parameter open-weight VLM specialized for historical text recognition. The model is trained on CHURRO-DS, the largest historical text recognition dataset to date. CHURRO-DS unifies 155 historical corpora comprising 99,491 pages, spanning 22 centuries of textual heritage across 46 language clusters, including historical variants and dead languages. We evaluate several open-weight and closed VLMs and optical character recognition (OCR) systems on CHURRO-DS and find that CHURRO outperforms all other VLMs. On the CHURRO-DS test set, CHURRO achieves 82.3% (printed) and 70.1% (handwritten) normalized Levenshtein similarity, surpassing the second-best model, Gemini 2.5 Pro, by 1.4% and 6.5%, respectively, while being 15.5 times more cost-effective. By releasing the model and dataset, we aim to enable community-driven research to improve the readability of historical texts and accelerate scholarship.



Instituto de Telecomunicações at IWSLT 2025: Aligning Small-Scale Speech and Language Models for Speech-to-Text Learning

Attanasio, Giuseppe, Sannigrahi, Sonal, Peters, Ben, Martins, André F. T.

arXiv.org Artificial Intelligence

This paper presents the IT-IST submission to the IWSLT 2025 Shared Task on Instruction Following Speech Processing. We submit results for the Short Track, i.e., speech recognition, translation, and spoken question answering. Our model is a unified speech-to-text model that integrates a pre-trained continuous speech encoder and text decoder through a first phase of modality alignment and a second phase of instruction fine-tuning. Crucially, we focus on using small-scale language model backbones (< 2B) and restrict to high-quality, CC-BY data along with synthetic data generation to supplement existing resources.


Quantum Adaptive Self-Attention for Quantum Transformer Models

Chen, Chi-Sheng, Kuo, En-Jui

arXiv.org Artificial Intelligence

Transformer models have revolutionized sequential learning across various domains, yet their self-attention mechanism incurs quadratic computational cost, posing limitations for real-time and resource-constrained tasks. To address this, we propose Quantum Adaptive Self-Attention (QASA), a novel hybrid architecture that enhances classical Transformer models with a quantum attention mechanism. QASA replaces dot-product attention with a parameterized quantum circuit (PQC) that adaptively captures inter-token relationships in the quantum Hilbert space. Additionally, a residual quantum projection module is introduced before the feedforward network to further refine temporal features. Our design retains classical efficiency in earlier layers while injecting quantum expressiveness in the final encoder block, ensuring compatibility with current NISQ hardware. Experiments on synthetic time-series tasks demonstrate that QASA achieves faster convergence and superior generalization compared to both standard Transformers and reduced classical variants. Preliminary complexity analysis suggests potential quantum advantages in gradient computation, opening new avenues for efficient quantum deep learning models.


Foundation Models -- A Panacea for Artificial Intelligence in Pathology?

Mulliqi, Nita, Blilie, Anders, Ji, Xiaoyi, Szolnoky, Kelvin, Olsson, Henrik, Boman, Sol Erika, Titus, Matteo, Gonzalez, Geraldine Martinez, Mielcarz, Julia Anna, Valkonen, Masi, Gudlaugsson, Einar, Kjosavik, Svein R., Asenjo, José, Gambacorta, Marcello, Libretti, Paolo, Braun, Marcin, Kordek, Radzislaw, Łowicki, Roman, Hotakainen, Kristina, Väre, Päivi, Pedersen, Bodil Ginnerup, Sørensen, Karina Dalsgaard, Ulhøi, Benedicte Parm, Ruusuvuori, Pekka, Delahunt, Brett, Samaratunga, Hemamali, Tsuzuki, Toyonori, Janssen, Emilius A. M., Egevad, Lars, Eklund, Martin, Kartasalo, Kimmo

arXiv.org Artificial Intelligence

The role of artificial intelligence (AI) in pathology has evolved from aiding diagnostics to uncovering predictive morphological patterns in whole slide images (WSIs). Recently, foundation models (FMs) leveraging self-supervised pre-training have been widely advocated as a universal solution for diverse downstream tasks. However, open questions remain about their clinical applicability and generalization advantages over end-to-end learning using task-specific (TS) models. Here, we focused on AI with clinical-grade performance for prostate cancer diagnosis and Gleason grading. We present the largest validation of AI for this task, using over 100,000 core needle biopsies from 7,342 patients across 15 sites in 11 countries. We compared two FMs with a fully end-to-end TS model in a multiple instance learning framework. Our findings challenge assumptions that FMs universally outperform TS models. While FMs demonstrated utility in data-scarce scenarios, their performance converged with - and was in some cases surpassed by - TS models when sufficient labeled training data were available. Notably, extensive task-specific training markedly reduced clinically significant misgrading, misdiagnosis of challenging morphologies, and variability across different WSI scanners. Additionally, FMs used up to 35 times more energy than the TS model, raising concerns about their sustainability. Our results underscore that while FMs offer clear advantages for rapid prototyping and research, their role as a universal solution for clinically applicable medical AI remains uncertain. For high-stakes clinical applications, rigorous validation and consideration of task-specific training remain critically important. We advocate for integrating the strengths of FMs and end-to-end learning to achieve robust and resource-efficient AI pathology solutions fit for clinical use.


Quantum Recurrent Neural Networks with Encoder-Decoder for Time-Dependent Partial Differential Equations

Chen, Yuan, Khaliq, Abdul, Furati, Khaled M.

arXiv.org Artificial Intelligence

Quantum Recurrent Neural Networks with Encoder-Decoder for Time-Dependent Partial Differential Equations Yuan Chen 1, Abdul Khaliq 1,2, and Khaled M. Furati 3 1 Computational and Data Science Program, Middle Tennessee State University, Murfreesboro, 37132, TN, USA 2 Department of Mathematical Science, Middle Tennessee State University, Murfreesboro, 37132, TN, USA 3 Department of Mathematics, King Fahd University of Petroleum & Minerals, Dhahran, 31261, Saudi Arabia Nonlinear time-dependent partial differential equations are essential in modeling complex phenomena across diverse fields, yet they pose significant challenges due to their computational complexity, especially in higher dimensions. This study explores Quantum Recurrent Neural Networks within an encoder-decoder framework, integrating V ariational Quantum Circuits into Gated Recurrent Units and Long Short-T erm Memory networks. W e evaluate the algorithms on the Hamilton-Jacobi-Bellman equation, Burgers' equation, the Gray-Scott reaction-diffusion system, and the three dimensional Michaelis-Menten reaction-diffusion equation. The results demonstrate the superior performance of the quantum-based algorithms in capturing nonlinear dynamics, handling high-dimensional spaces, and providing stable solutions, highlighting their potential as an innovative tool in solving challenging and complex systems. 1 Introduction Partial differential equations (PDEs) are fundamental mathematical tools for modeling diverse phenomena in many fields such as physics, biology, chemistry, and economics. However, for many complex and high-dimensional PDEs, analytical solutions are often unattainable due to Yuan Chen: yc3y@mtmail.mtsu.edu To address this, numerical methods such as the finite-difference method (FDM) [1], finite-element method (FEM) [2], and finite-volume method (FVM) [3] have been developed to approximate solutions. These techniques have been effective in a variety of applications but face limitations in computational complexity, stability, and scalability, especially when applied to non-linear or high-dimensional problems.


MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Gaido, Marco, Papi, Sara, Bentivogli, Luisa, Brutti, Alessio, Cettolo, Mauro, Gretter, Roberto, Matassoni, Marco, Nabih, Mohamed, Negri, Matteo

arXiv.org Artificial Intelligence

The rise of foundation models (FMs), coupled with regulatory efforts addressing their risks and impacts, has sparked significant interest in open-source models. However, existing speech FMs (SFMs) fall short of full compliance with the open-source principles, even if claimed otherwise, as no existing SFM has model weights, code, and training data publicly available under open-source terms. In this work, we take the first step toward filling this gap by focusing on the 24 official languages of the European Union (EU). We collect suitable training data by surveying automatic speech recognition datasets and unlabeled speech corpora under open-source compliant licenses, for a total of 950k hours. Additionally, we release automatic transcripts for 441k hours of unlabeled data under the permissive CC-BY license, thereby facilitating the creation of open-source SFMs for the EU languages.


Anomaly Detection from a Tensor Train Perspective

Ali, Alejandro Mata, de Leceta, Aitor Moreno Fdez., Rubio, Jorge López

arXiv.org Artificial Intelligence

We present a series of algorithms in tensor networks for anomaly detection in datasets, by using data compression in a Tensor Train representation. These algorithms consist of preserving the structure of normal data in compression and deleting the structure of anomalous data. The algorithms can be applied to any tensor network representation. We test the effectiveness of the methods with digits and Olivetti faces datasets and a cybersecurity dataset to determine cyber-attacks.